Fundamental Bounds and Approaches to Sequence Reconstruction from Nanopore Sequencers

نویسندگان

  • Jarek Duda
  • Wojciech Szpankowski
  • Ananth Grama
چکیده

Motivation: Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (InDel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given InDel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) what is the number of ‘typical’ sequences within the distortion bound induced by InDel errors; (iii) using repeated extrusion through the nanopore, what is the number of repetitions needed to reduce the distortion bound so that only one typical sequence exists within the distortion bound. Results: Our results provide a number of important insights: (i) the maximum length of a sequence that can be accurately reconstructed in the presence of InDel errors is relatively small; (ii) the number of typical sequences within the distortion bound is large; and (iii) repeated extrusion is an effective technique for unique reconstruction. In particular, we show that the number of repeats is a slow function (logarithmic) of sequence length – implying that through repeated extrusion, we can sequence large reads using nanopore sequencers. InDel errors are the primary error mode for nanopore sequencers. To this end, the results in this paper can be viewed as (tight) bounds on reconstruction lengths and repetitions for accurate reconstruction. Contact: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nanopore-based Fourth-generation DNA Sequencing Technology

Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis ope...

متن کامل

Nanopore-CMOS Interfaces for DNA Sequencing

DNA sequencers based on nanopore sensors present an opportunity for a significant break from the template-based incumbents of the last forty years. Key advantages ushered by nanopore technology include a simplified chemistry and the ability to interface to CMOS technology. The latter opportunity offers substantial promise for improvement in sequencing speed, size and cost. This paper reviews ex...

متن کامل

Enrichment by hybridisation of long DNA fragments for Nanopore sequencing

Enrichment of DNA by hybridisation is an important tool which enables users to gather target-focused next-generation sequence data in an economical fashion. Current in-solution methods capture short fragments of around 200-300 nt, potentially missing key structural information such as recombination or translocations often found in viral or bacterial pathogens. The increasing use of long-read th...

متن کامل

Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing

Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately character...

متن کامل

Training alignment parameters for arbitrary sequencers with LAST-TRAIN

Summary LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. Availability and Implementation the source code is freely available ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1601.02420  شماره 

صفحات  -

تاریخ انتشار 2015